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Abstract 

Burkholderia terrae strain BSO0 1 , obtained as an inhabitant of the mycosphere of Laccaria proxima (a close relative of Lyophyllum sp. 
strain Karsten), actively interacts with Lyophyllum sp. strain Karsten. We here sunnmarize the remarkable ecological behavior of B. 
ferrae BS001 in the mycosphere and add key data to this. Moreover, we extensively analyze the approximately 1 1 .5-Mbfive-replicon 
genome of 6. ferraeBSOOl and highlight its remarkable features. Seventy-nine regions of genomic plasticity (RGP), that is, 15.48% of 
the total genome size, were found. One 70.42-kb RGP, RGP76, revealed a typical conjugal element structure, including a full type 4 
secretion system. Comparative analyses across 24 related Burkholderia genomes revealed that 95.66% of the total BS001 genome 
belongs to the variable part, whereas the remaining 4.34% constitutes the core genome. Genes for biofilm formation and several 
secretion systems, under which a type 3 secretion system (T3SS), were found, which is consistent with the hypothesis that T3SSs play a 
role in the interaction with Lyophyllum sp. strain Karsten. The high number of predicted metabolic pathways and membrane 
transporters suggested that strain BS001 can take up and utilize a range of sugars, amino acids and organic acids. In particular, a 
unique glycerol uptake system was found. The BS001 genome further contains genetic systems for the degradation of complex 
organic compounds. Moreover, gene clusters encoding nonribosomal peptide synthetases (NRPS) and hybrid polyketide synthases/ 
NRPS were found, highlighting the potential role of secondary metabolites in the ecology of strain BS001 . The patchwork of genetic 
features observed in the genome is consistent with the notion that 1) horizontal gene transfer is a main driver of B. terrae BS001 
adaptation and 2) the organism is very flexible in its ecological behavior in soil. 
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Introduction 

As a result of the spatial separation in soil between bacterial 
cells and substrate, as well as the general recalcitrancy of sub- 
strate to degradation, soil bacteria most of the time perceive 
conditions of nutrient scarcity in their habitat and occur in a 
state of dormancy. However, there are so-called hot spots in 
soil in which soil bacteria are activated. One such hotspot for 
bacterial activity is the mycosphere, that is, the narrow zone of 
influence around the fungal hyphae in soil. 

Evolutionary and genome-shaping events in soil bacteria 
(mutations and horizontal gene transfer and selective 



processes) are thought to mostly take place in high-activity 
microenvironments, and hence a thorough investigation of 
the genomic diversity of organisms dwelling in the myco- 
sphere is warranted. On the basis of two different experimen- 
tal setups, Burkholderia terrae like organisms were previously 
found to be key associates with fungal hyphae in soil 
(Warmink and van Elsas 2009). Indeed, since this initial discov- 
ery, a fascinating interaction between the mycosphere-iso- 
lated B. terrae strain BS001 and Lyophyllum sp. strain 
Karsten has been unveiled (Warmink and van Elsas 2009; 
Nazir, Zhang, et al. 2012). 
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The draft genome sequence of strain BS001 was recently 
announced (Nazir, Hansen, et al. 2012), but no detail on key 
genetic systems and on how the genomic information deter- 
mines the ecophysiological behavior of the organism was pro- 
vided. From the previous work, circumstantial ecologically 
based evidence was obtained for the contention that the pres- 
ence of type 3 secretion system (T3SS), motility and biofilm 
formation traits as well as glycerol uptake and metabolism 
genes might offer selective advantages to the organism in 
the mycosphere. However, a tight linkage of the ecological 
data with genome features of strain BS001 has been lacking 
so far, in spite of the fact that this linkage is needed to build 
hypotheses that are testable in further experiments. Thus, in 
this study, we provide an in-depth analysis of the genome of 
B. terrae BS001, focusing on the plethora of genetic systems 
that potentially drive the interactions between B. terrae BS001 
and the soil fungi it associates with. As the reference fungus, 
Lyophyllum sp. strain Karsten was used. To achieve our aim of 
assessing the genome in a broad sense and also addressing 
the uniqueness of particular features for strain BS001 , we then 
performed comparative analyses of selected salient features of 
the genome of B. terrae BS001 with those of other (related) 
Burkholderia strains. From the data, we make inferences 
about the implications for the behavior of strain BSOOl in 
soil, in an attempt to narrow the gap between (selected) ge- 
notype and phenotype. 

Results and Discussion 

Genome Properties 

We first briefly describe the overall genome characteristics, 
before considering particular genetic systems that might 
relate to the ecological behavior of B. terrae BSOOl. The 
genome has an estimated size of about 11.5 Mb, and was 
found to consist of five replicons with a G+C content of 
61.8% (Nazir, Hansen, et al. 2012). The genome of B. 
terrae BSOOl is extremely large compared with the genomes 
of other Burkholderia species, that is, B. rhizoxinica HKI454 
(3.75 Mb)(Lackner, Moebius, Partida-Martinez, etal. 201 1), B. 
phytofirmans PsJN (8.21 Mb) (Mitter et al. 2013), and B. phy- 
matum STM815 (8.68 Mb). The total number of predicted 
coding sequences (CDSs) was 12,047, compared with 4,146 
in the genome of B. rhizoxinica HKI454, 8,043 in that of B. 
phytofirmans PsJN, and 8,434 in that of 6. phymatum 
STM815 (table 1). A relatively high number of CDSs, amount- 
ing to 38% of the genome, was predicted to encode proteins 
of unknown function. 

The giant genome size and dense gene coverage suggest 
that strain BSOOl disposes of a suite of lifestyle choices in the 
soil that are selected as a result of the conditions that may 
reign locally in the microhabitat it occupies in soil. In previous 
studies, we found a (one-sided) correlation between fungal- 
interactivity in soil bacteria and the presence of 1) motility 



traits and 2) a T3SS. In the following section, we highlight 
the ecological features of B. terrae BSOOl in soil and explore 
the genome for the presence of these as well as other char- 
acteristics that may correlate with the (fungal-interactive and 
otherwise) lifestyle in soil. 

Ecological Features that Presumably Drive the Genome 
Structure of B. terrae BSOOl 

Salient ecological features of B. terrae BSOOl in soil and the 
mycosphere are summarized in supplementary table SI, 
Supplementary Material online. Burkholderia terrae was 
found to be a key inhabitant of the mycosphere of Laccaria 
proxima in the field, whereas B. terrae BSOOl was an avid mi- 
grator with growing Lyophyllum sp. strain Karsten mycelium. 
The rapid migration along the mycelial growth front endowed 
strain BSOOl with the remarkable capacity to successfully reach 
novel (remote) microhabitats in the soil (Warmink and van Elsas 
2009). Moreover, strain BSOOl revealed a migration helper 
effect, as it stimulated the comigration of the nonmigrator 
Dyella japonica BS003 along with the fungal hyphae 
(Warmink et al. 201 1 ). The organism further showed avid bio- 
film formation around hyphae of the fungal host. The extra- 
cellular polysaccharides constituting the biofilm are thought to 
act like a shield against antifungal agents (Warmink and van 
Elsas 2009), allowing protection of the fungal host. 
Burkholderia terrae BSOOl was further found to be able to 
comigrate with a range of other selected soil fungi through 
soil, allowing the notion that it is a "generalist" migrator. The 
fungal-interactive capacity was suggested to involve bacterial 
motility as well as the activity of a T3SS (Warmink and van Elsas 
2009). Moreover, considering resource utilization, B. terrae 
BSOOl — much like Variovorax paradoxus HB44 — was found 
to grow on Lyophyllum sp. strain Karsten released glycerol 
(tested in liquid microcosms). The organism was even able to 
trigger the release of glycerol from host cells by an as-yet-un- 
known mechanism (Boersma et al. 2010; Nazir et al. 2013). 
Furthermore, it grew avidly on glycerol as the sole carbon 
source (data not shown). In this sense, B. terrae BSOOl is 
likely to be an excellent competitor in soil in competitive situ- 
ations where carbon sources such as glycerol become avail- 
able. Finally, B. terrae BSOOl was indicated to affect the 
physiology of Lyophyllum sp. strain Karsten, including the in- 
duction of a secretome that contained high levels of glycerol, 
next to the inhibition of primordium formation by fungal my- 
celial mats in the microcosms (Nazir et al. 201 3). 

Core and Pan Genome Analyses 

We used the "MicroScope" platform (Vallenet et al. 2013) to 
determine the genetic landscape of the 6. terrae BSOOl 
genome, taking as references the available (MicroScope) ge- 
nomes of 23 other Burkholderia species, that is, B. phymatum 
STM815, B. phytofirmans PsJN, B. rhizoxinica HKI454, 
fi. glumae (strains; AU6208, LMG 2196, BGR1, and 3252- 
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Table 1 

Comparison and General Features of Burkholderia Genomes 



Bacterial specie 


B. terrae 


S. rhizoxinica 


B. phytofirmans 


S. phymatum 




BS001 


HKI454 


PsJN 


STIMSIS 


Genome size (Mb) 


11.5 


3.75 


8.21 


8.68 


GC content (%) 


61.8 


60.7 


62.2 


62.3 


Number of CDSs 


12,047 


4,146 


8,043 


8,434 


tRNAs 


51 


59 


63 


62 


rRNA operons 


4 


9 


18 


16 



8), Burkholderia sp. (strains; CCGE1002, CCGE1003, and 
TJ149), B. lata 383, B. kururiensis Ml 30, B. thailandensis 
E264, B. mallei ATCC 23344, B. pseudomallel K96243, B. 
cenocepacia AU 1054, B. phenoliruptrix BR34S9a, B. ambifaria 
AMMD, B. gladioli {strains; 3848s-5 and BSR3), B. vietnamien- 
sls G4, B. multlvorans ATCC 1 761 6, and B. xenovorans LB400. 
First, B. terrae BS001 had the largest genome of all (1 1 .5 Mb 
compared with 9.7 Mb for 6. xenovorans LB400), indicating a 
large accessory gene part. The pan genome across these ge- 
nomes is defined as the sum of the core genome (genes pre- 
sent in all twenty four organisms), the variable genomes 
(genes found in some organisms while being absent from 
the others) and the strain-specific genes (Medini et al. 
2005), whereas the core genome is the collective number of 
shared genes between all genomes. The analysis revealed a 
total of 1 80, 1 24 CDSs to be present in the pan genome of the 
24 genomes, which were divided over 81,027 MicroScope 
gene families (MICFAM; see Materials and Methods section). 
The core genome across all Burkholderia species comprised 
472 gene families; however, the exact number of core CDSs 
was different across the compared genomes, as a result of 
some core CDSs having more than one copies. Thus, strain 
BS001 was found to have a core of 523 CDSs as several core 
genes had paralogs. The 523 CDSs core genome of strain 
BS001 encompassed only 4.34% of its total genome. The 
remaining 95.55% (11,522 CDSs) thus constituted the vari- 
able or accessory part. Of these, there were 6,099 (50.53%) 
strain-specific CDSs in strain BS001 whereas the remainder 
(-5,000) were volatile, meaning that they were present 
across a limited number of genomes (supplementary table 
S2, Supplementary Material online). Clearly, as found by 
others, we also found that the sizes of the core and pan ge- 
nomes depended strongly on the number of genomes ana- 
lyzed, resulting in shrinking core and expanding pan genomes 
with depth of genome sampling (fig. 1). Also, across the 
Burkholderia strains that were analyzed, the pan genome 
should be considered to be open, as, with every new 
genome sequenced, several hundreds of novel genes are 
found (data not shown). Our analysis further revealed that 
the core genome across all 24 Burkholderia genomes con- 
sisted of CDSs encoding key cellular functions such as DNA 
recombination, replication, metabolism, transcription, transla- 
tion, glycolysis, amino acid activation (tRNAs), chaperoning. 



RNA modification, transcription regulation, DNA repair, fatty 
acid biosynthesis, peptidoglycan biosynthesis, posttransla- 
tional modification, and cell division. Overall, the level of ho- 
mology (translated amino acid sequences) of the genome of 6. 
ferrae strain BS001 with these genomes was between 55% 
and 60%. 

Phylogeny and Synteny Groups 

We first constructed a maximum-likelihood tree on the basis 
of seven concatenated genes for representative Burkholderia 
species from groups A (planfsoil related, nonpathogens) and 
B (pathogens) (Estrada-de los Santos et al. 2013) to locate the 
position of strain BS001. Indeed, strain BS001 belonged to 
group A, being very closely related to B. phymatum STM815 
(fig. 2). In this analysis, we further found that B. rhizoxinica HKI 
454 could not be related to any of the other Burkholderia 
groups, as also obsen/ed by Estrada-de los Santos et al. 
(2013). We then performed an analysis of synteny between 
the strain BS001 genome versus selected available genomes in 
the NCBI (National Center for Biotechnology Information) 
RefSeq (reference sequence) database. The analyses (fig. 3) 
revealed that the B. phymatum STM81 5 genome had highest 
CDSs synteny to the BS001 genome, that is, 76.78% (fig. 2), 
followed by B. graminis C4D1 M, B. phytofirmans PsJN, and B. 
kururiensis Ml 30 (69.47%, 67.97%, and 64.79%, respec- 
tively). These four Burkholderia species are members of 
Burkholderia group A (Estrada-de los Santos et al. 2013) to 
which B. terrae BS001 also belongs (Nazir, Zhang, et al. 2012). 
A trend of decreasing % CDSs synteny was observed as we 
moved from Burkholderia group A members to those of group 
B, that is, B. cenocepacia AU 1056 (58.66%), B. pseudomallel 
K96243 (58.1%), B. mallei ATCC 23344 (55.49%), and 
B. vietnamiensis G4 (51.34%) (fig. 3). The levels of synteny 
of the outgroups, that is, Ralstonia, Cupriavidus, and 
Escherichia, with the genome of the strain BS001, were 
much lower. These findings suggest that the conservation of 
synteny was highest among members of the group A 
Burkholderia and it gradually decreased as the phylogenetic 
distance increased. 

Secretion Systems and Effector Proteins 

Secretion systems such as the T3SS may be important for 
bacteria when these interact with hosts such as fungi 
(Lackner, Moebius, Hertweck, et al. 2011). We investigated 
the genome of B. terrae BS001 for the presence of (protein) 
secretion systems, which have a potential role in the interac- 
tion with Lyophyllum sp. strain Karsten. Basically, we found all 
known protein secretion systems, that is, the type 1 secretion 
system (T1 SS) through to the type 6 secretion system (T5SS), in 
the genome of B. terrae BS001. The T1SS is normally com- 
posed of three constituents, including ATP-binding cassette 
(ABC) transporters, membrane fusion proteins, and outer 
membrane proteins (Delepelaire 2004; Holland et al. 2005). 
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Fig. 1. — Core-pan genome size evolution. {A) Pan genome: Pan genome size is directly proportional to the number of genomes. (6) Core genome: Core 
genome size is inversely proportional to the number of genomes. 



The BS001 genome harbored ABC transporters and related 
proteins that are part of the T1 SS (discussed later in detail). 

The type 2 secretion system (T2SS) (constituted of 9-1 1 
genes), which is used to translocate a wide range of proteins, 
was present as three gene clusters, numbered 1 
(AKAUv1_257005-AKAUv1_2570061), 2 (AKAUv1_25701 
49-AKAUv1_2570159), and 3 (AKAUv1_7901 12-AKAUv1_ 
790123) across the genome. The latter (11-genes) cluster 
was completely syntenous and highly homologous (>80% 
amino acid identity [AAI]) to a similar system in B. phymatum 
STM815. The BS001 genome also revealed the presence of 
two copies of a subtype of the T2SS, the Tad (tight adherence) 
macromolecular transport system. Tad genes encode a ma- 
chinery that is required for the assembly of adhesive fimbrial 
low-molecular-weight protein (Flp) pili. The machinery is asso- 
ciated with a so-called widespread colonization island (Tomich 
et al. 2007). The Flp pili play crucial roles in the formation of 
biofilms, colonization, and pathogenesis across several bacte- 
rial genera (Tomich et al. 2007). The first 14-genes ted cluster 
(AKAUv1_990039-AKAUv1_990052) was highly syntenous 
and homologous to similar systems in B. phytofirmans 
PsJN (-60-70% AAI) and B. phymatum STM81 5 (-70-75% 
AAI). The second, 13-gene, tad copy (AKAUv1_550027- 
AKAUv1_550039), located at a different position, also 
had highest synteny with a second system in 



B. phymatum STM815 (>70% AAI) and B. phytofirmans 
PsJN (>55% AAI). 

The T3SS has a crucial role in the virulence of several plant 
and human pathogens (Tseng et al. 2009). It may also be 
involved in bacterial-fungal interactions, in particular aiding 
bacteria in their migratory response to extending fungal 
hyphae in the mycosphere (Warmink and van Elsas 2009). 
Indeed, the few bacterial types that were able to migrate 
along with the hyphal front of Lyophyllum sp. strain Karsten 
in soil microcosms were all positive for the presence of the 
T3SS (Warmink and van Elsas 2009). The T3SS makes mem- 
brane-bound complexes that can secrete effector proteins into 
eukaryotic host cells. The B. terrae BS001 genome possesses 
one T3SS gene cluster, containing 22 canonical T3SS genes 
(AKAUv1_540178-AKAUv1_540199). Out of these, nine 
were found to be highly conserved, making up the proposed 
core apparatus (SctSRQVUJNTO, however without a canonical 
gene for the "needle." Interestingly, the system was found to 
be highly syntenous and homologous (60-65% AAI) to the 
T3SS of the fungal-interactive strain 6. rhizoxinica HKI454 as 
well as that of B. glumae BGR1 (fig. 4). Remarkably, a CDS 
with unknown function (AKAUv1_540179) was found, of 
which the location and length compelled us to argue that it 
encodes a needle protein. Based on PSIPRED predicted a-helix 
structures, we assume that this CDS is a functional homolog 
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Fig. 2. — A maximum-likelihood tree illustrating the relationship of selected Burkholdeha species. The tree is based on alignment of seven concatenated 
core genes {aroE, dnaE, groeL, gyrB, mutL, recA, and rpoB) and was generated using MEGA. Burkholdeha terrae BS001 falls within group A and is closely 
related to 6. phymatum STM81 5. Bootstrap values (more than 50%) are shown at each node. Group A and group B are defined by Estrada-de los Santos et 
al. 2013. 



of SctF (B. rhizoxinica HKI454) or its Ralstonia counterpart 
Hrpy (Lackner, Moebius, Hertweck, et al. 2011). To under- 
stand the evolutionary position of the T3SS of strain BS001, 
we constructed a phylogenetic tree (fig. 5) using eight con- 
catenated core genes {SctS, SctR, SctQ, SctV, SctU, ScU, SctN, 
and SctV) that were conserved across the tested genomes. The 
analysis supports the conclusion that the T3SS of B. terrae 
BS001 belongs to the so-called Hrp2 family, to which most 
of the Burkholderia and Ralstonia T3SSs belong (Abby and 
Rocha 2012). The TBSSs of B. terrae BS001 and B. rhizoxinica 
HKI454 might fulfill similar roles. Both strains interact actively 
with fungal hosts and there is evidence that T3SSs play a role 
in the interaction in both cases (Lackner, Moebius, Hertweck, 
et al. 201 1; Warmink and van Elsas 2008). 

The type 4 secretion system (T4SS) is responsible for secret- 
ing proteins and, most importantly, nucleic acids (coupled to 
proteins; nucleoproteins) across the inner and outer mem- 
branes into recipient cells or into the external milieu. It plays 



a key role in the transfer of plasmids or integrated chromo- 
somal elements, next to being involved in virulence on animals 
(protein secretion, e.g., in l-lelicobacter pylori) or plants (plas- 
mid transfer, e.g., in Agrobacterium tumefaciens) (Dale and 
Moran 2005). Remarkably, the genome of B. terrae BS001 
was found to contain three T4SS gene clusters. The first 
one, T4SS-I (20 CDSs; AKAUv1_3130024-AKAUv1_ 
3130044), with all canonical T4SS functions (fig. 6), is present 
on the 70.42-kb region of genome plasticity (RGP) RGP76. The 
predicted \//r670geneof T4SS-I had very high AAls with similar 
genes from a Burkholderia sp. BT03 plasmid (94%) and the B. 
phymatunn STM815 genome (71%). Furthermore, non-T4SS 
open reading frames (ORFs) on RGP76 were predicted to 
encode a DNA circulation family protein, DNA replication pro- 
tein tnpB, transposase tnpA, IS5 transposase insH, next to 
seven integrases, including two phage family integrases. 
Further, at the downstream boundary of RGP76, four trans- 
posases and two integrases were localized. The T4SS was 
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Burkholderia species including 6. rhizoxinica HKI454, B. phenoliruptrix 3459a, B. kururiensis Ml 30 and 6. glumae BGR1. 



completely syntenous and highly homologous (70-75%) with 
a B. phymatum STM815 T4SS located on plasmid pBPHY02 
(595, 108 bp). Moreover, the regions flanking RGP76 also con- 
tained CDSs potentially involved in gene mobility, that is, ORFs 
similar to the typical plasmid replicon genes parA, parB, parD, 
parE, mobB, mobC, repB, korC, and kIcA. RGP76 further car- 
ried several accessory genes, that is, those predicted to func- 
tion as a putative xanthine dehydrogenase/carbon monoxide 
dehydrogenase acceptor (AKAUv1_307001 5, AKAUv1_ 
3070016) and molybdenum binding xanthine dehydrogenase 



(AKAUv1_31 3001 7). The remaining accessory ORFs were pre- 
dicted to encode conserved proteins of unknown functions. 

A second system, T4SS-II, containing 13 genes, was 
found to be present elsewhere in the genome 
(AKAUv1_1810006-AKAUv1_1810019). This cluster had a 
gene composition different from the first one, as it contained 
the virB8, virB9, and virBlO genes but lacked other wr genes. 
Instead, it contained CDSs for lytic transglycosylase (two CDSs), 
pilL, toxin coregulated pilus biosynthesis protein Q, pulD, pulE, 
pu/F (T2SS), and conserved hypothetical proteins genes. 
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(22% homology), was closely related (76% AAI) to a T4SS of 
Burkholderia sp. SJ98 (supplementary fig. SI, Supplementary 
Material online). 

The B. terrae BS001 genome further revealed the presence 
of four copies of predicted T6SSs, numbered T6SS-I through 
T6SS-IV (12-25 genes). T6SS-1 (16 genes; AKAUv1_650058- 
AKAUv1_650074) had highest synteny and homology 
(>40%) with a genomic region of B. gladioli BSR3. T6SS-II 
(19 genes; AKAUv1_1080195-AKAUv1_1080214) was 
highly syntenous and homologous (85-95%) with a similar 
cluster of B. phymatum STM815. T6SS-III and T6SS-IV 
comprised lower numbers of genes, that is, 8 (AKAUv1_ 
1680089-AKAUv1_1 680097) and 5 (AKAUv1_920001- 
AKAUv1_920005), respectively. T6SS-III had highest synteny 
with clusters in B. phenoliruptrix BR3459a and B. kururiensis 
Ml 30 (both 55-60% homology). T6SS-IV, found on RGP25, 
was syntenous with five-genes clusters (40-50% homology) 
in 6. glumae strains BGR1 and AU6208 and Burkholderia sp. 
CCGE 1002 (supplementary fig. S2, Supplementary Material 
online). 

The B. terrae BS001 genome also contained several genes 
for signal peptide bearing proteins (supplementary table S3, 
Supplementary Material online) that may be exported in a sec- 
dependent fashion. Among these, the so-called type II 
secreted proteins fall in the category of lytic and toxin-related 
proteins (Cianciotto 2005). Furthermore, putative ppiA 
(encoding peptidyl-prolyl cis-trans isomerase A; rotamase A; 
AKAUv1_720068), next to genes for parvulin-type peptidyl- 
prolyl cis-trans isomerase (AKAUv1_1 270066), and peptidyl- 
prolyl cis-trans isomerase (AKAUv1_1390107) were found. 
The latter protein may have pathogenicity-enhancing abilities 
in Legionella pneumophila and Xanthomonas campestris pv. 
campestris 8004 (Fischer et al. 1992; Zang et al. 2007). It has 
also been reported to exist in B. rhizoxinica HKI454 (Lackner, 
Moebius, Partida-Martinez, et al. 2011). We also found gene 
iagB (AKAUv1_790111) and its three duplicates (45-65% 
AAI), which produces an invasion protein and may be involved 
in the invasion of eukaryotic host cells by Shigella flexneri and 
Salmonella enterica serovar typhi (Allaoui et al. 1993; Miras 
et al. 1995). Three genes predicted to encode ankyrin 
(AKAUv1_540128, AKAUv1_920061 , AKAUv1_1 270092) 
and tetratricopeptide (TPR) repeat containing proteins were 
also present in the BS001 genome. These proteins may be 
involved in protein-protein interactions as host interactive fac- 
tors (Edqvist et al. 2006). Both ankyrin and TPR repeat proteins 
have also been reported in B. rhizoxinica HKI454 (Lackner, 
Moebius, Partida-Martinez, et al. 2011). 

Putative T3SS secreted effector proteins were then searched 
for using www.effectors.org (last accessed June 26, 2014) 
(Jehl et al. 2011) (supplementary table S4, Supplementary 
Material online), which were analyzed with respect to the pu- 
tative interaction of 6. terrae BS001 with Lyophyllum sp. strain 
Karsten. These effectors need confirmation, as we ignore their 
functions at this moment. Nevertheless, we found putative 



phytoene synthase (AKAUv1_920106), of the squalene- 
hopene cyclase enzyme family and may be involved in the 
biosynthesis of terpenoids. This is consistent with the presump- 
tion that terpenoids mediate the interactions of B. terrae BS001 
with its host, like reported for 6. rhizoxinica HKI454 in its inter- 
action with the fungal host (Lackner, Moebius, Partida- 
Martinez, et al. 2011). Among the predicted effectors we 
also found flagellin proteins (coined pathogen-associated mo- 
lecular patterns) both in plants and in mammals (Ronald and 
Beutler 2010; Wei etal. 2013). 

Pilus, Biofilm Formation, Motility, and Insecticidal Toxin 
Complexes 

The genome of B. terrae BS001 further carries gene clusters 
predicted to encode flagellar biosynthesis, biofilm formation, 
motility, and (seven genes) type 4 pili. The latter pill may play a 
role at fungal surfaces, as, once attached to the surface, bac- 
teria can "walk" on it by employing the action of such pili 
(Conrad 2012). Genes that encode such pili, that is, pilMNQ 
(AKAUv1_1 170009, AKAUv1_1 170010, AKAUv1_1 170012), 
were present in a cluster. We also found pilin precursor gene 
PilA (AKAUv1_2 170003), signal peptidase PUD (AKAUv1_ 
2940033), NTP-binding protein PUB (AKAUv1_2940037), 
and P/yC(AKAUv1_2940035) in the genome. 

Bacterial biofilm formation is dependent on the ability to 
produce extracellular matrix which may be composed of poly- 
mers like poly-beta-1,6-A/-acetyl-D-glucosamine (PGA) (Izano 
et al. 2007; Bosse et al. 2010). We found several ORFs pre- 
dicted to be involved in PGA production, that is, pgaA (PGA 
synthesis protein; AKAUv1_1000016), pgaB (PGA synthesis 
lipoprotein; AKAUv1_1000017), and pgaC (PGA synthesis 
AZ-glycosyltransferase; AKAUv1_1000018). In addition, two 
other biofilm synthesis gene clusters (both representing a Pel 
operon homolog) were identified. The Pel operon provides a 
scaffold in the early stage of biofilm formation by 
Pseudomonas aeruginosa PA14, (Colvin et al. 2011). Pel is 
thought to encompass a glucose-rich polysaccharide polymer 
that is encoded by a seven-genes operon (pelA-F) (Friedman 
and Kolter 2004). Cluster 1 comprised pelABCDEFGA 
(AKAUv1_2280066-AKAUv1_2280073), and cluster 2 
pelGFEDCBA (AKAUv1_100003-AKAUv1_1 00009). Thus 
the Pel region may have undergone a recent duplication fol- 
lowed by some reshuffling, as evident from our analysis. Two 
genes, taken as proxies in the two systems, that is, peIG and 
pelF, were 61 .84% and 64.69% identical, respectively, 
whereas pelABCDE had lower (43.52%, 36.33%, 47.9%, 
40.08%, and 45.55%) AAI. An extensive screening further 
revealed the presence of ten other CDSs (i.e., 4 epsA, 4 
epsB, and 2 epsP) in the genome that are potentially involved 
in the biosynthesis of other exopolysaccharide (EPS). Overall, 
the presence of the pel gene clusters reflects another capacity 
of B. terrae BS001 to form biofilms during its interaction with 
Lyophyllum sp. strain Karsten (as well as other hosts) in soil. 
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Moreover, the B. terrae BS001 genome harbors one gene 
cluster that encodes flagellar biogenesis proteins. The cluster 
revealed the presence of 46 genes (AKAUv1_1 20002- 
AKAUv1_1 20048) in a region that was highly syntenous 
with regions of the STM815 (80-90% AAI) and PsJN 
(75-85% AAI) genomes. Moreover, the BS001 genome con- 
tained a gene cluster comprised 1 1 genes (AKAUv1_790001- 
AKAUv1_790011) associated with bacterial chemotaxis and 
motility. Again, this cluster was syntenous with a similar one in 
strain STM815 (85-95% AAI). Elsewhere in the genome, an- 
other nine flagellar genes {fIhA, flhB, 2 [flhC], 3 [flhD], fliC, and 
fliD) were found, next to a seven-genes chemotaxis/motility 
cluster (AKAUv1_920023-AKAUv1_920030). 

Interestingly, the genome of 8. terrae BS001 harbored two 
ORFs (AKAUv1_2 130030 and AKAUv1_2 130031) coding for 
a putative insecticidal toxin complex, as also reported to be 
present on the 822-kb megaplasmid pBRHOI in B. rhizoxinica 
HKI454 (Lackner, Moebius, Partida-Martinez, et al. 201 1). We 
analyzed the putative horizontal mobility of such ORFs. 
Indeed, the two ORFs formed part of the largest genomic 
island, RGP53 (102,873 bp), suggesting that they may have 
been acquired horizontally. Thus strain BS001 might have 
insect pathogenicity traits, indicating its putative interaction 
with soil insects. 

Genomic Islands across the Genome of B. terrae BS001 

We employed "MicroScope" (Vallenet et al. 2013) to predict 
the presence of RGPs in the BS001 genome. The analysis pre- 
dicted the existence of 79 RGPs, which were scattered across 
the genome (supplementary table S5, Supplementary Material 
online). The total size of the collective RGPs was 1 ,896,071 bp, 
accounting for 1 6.48% of the genome. Among the RGPs, we 
found the aforementioned 70.42 kb plasmid-like RGP76 
(AKAUv1_3070018-AKAUv1_3130074). An overview of ten 
important RGPs with plasmid-related and functional genes is 
given in supplementary table S6, Supplementary Material 
online. 

Briefly, on RGP34, we identified two ORFs 
(AKAUv1_1 280007 and AKAUv1_1 280008) encoding puta- 
tive addiction proteins (toxin and antitoxin). In this region, four 
ORFs (AKAUv1_1 930029, AKAUv1_1 930030, AKAUv1_ 
1 2301 00, and AKAUvl _1 2301 01) encoding HigA, HigB (plas- 
mid maintenance protein), StbD and StbE (stabilization 
proteins) were found just outside of the RGP. Similarly, puta- 
tive plasmid segregation antitoxin CcdA and CcdB toxin 
proteins are encoded by ORFs AKAUv1_241001 5 and 
AKAUv1_2410016, both located outside plasticity regions. 
On the other hand RGP61 harbored two ORFs, 
AKAUv1_2490012 and AKAUvl _24900 13, which encoded 
CcdB and CcdA. Downstream of these genes, the region har- 
bored an ORF (AKAUvl _2 49001 5) predicted to encode a pro- 
tein having 39% (coverage 77%) homology with cryptic 
plasmid protein A from Neisseria gonorrhoeae NCCP1 1945. 



Interestingly, it also revealed an ORF (AKAUvl _2490048) that 
encoded a beta-lactamase (with 55% and 51 % similarity to a 
similar protein in Bradyrhizobium SG-6C and R. etii CRN 42, 
respectively). Furthermore, we identified putative parE and 
parD genes (AKAUvl _2860003 and AKAUvl _2860004) in 
RGP71, whereas another ORF (AKAUvl _330002) encoding 
"prevent host death" family protein; Phd was present out- 
side the RGPs. We also identified H/gfi and IHigA 
(AKAUv1_360025 and AKAUv1_360026) as well as MazE 
and IVIazF addiction modules (AKAUv1_920172 and 
AKAUv1_920173), outside the predicted RGPs. Such toxin- 
antitoxin system proteins play a role in mediating stability of 
plasmids by imposition of an addiction mechanism to the host 
(Arcus et al. 2005). They may also play a role in mediating 
growth arrest during stress situations (Gerdes et al. 2005). 

The largest RGP, RGP53, was found to span 102,873 bp 
(AKAUv 1 _2 1 200 17-AKAUv1_2 140005), carrying ORFs for 
seven transposases and one integrase. A screen for functional 
genes in this RGP revealed the presence of one gene sip 
encoding a siderophore interacting protein (SIP) 
(AKAUv1_2130016) highly similar to a sip from Burl<holderia 
sp. BT03 (99%). The gene had a duplicate downstream 
(AKAUvl _2 130093), having (99%) similarity with sip of 
Burl<holderia sp. BT03. Another ORF (AKAUv 1_2 130052) 
was predicted to encode invasion plasmid antigen J (IpaJ); it 
had 35% homology with IpaJ from S. flexneri 2002017. 
RGP53 also carried a gene for type III effector protein l-iopJ 
(AKAUvl _21 30053); it was 53% homologous to similar gene 
from Salinivibrio costicola. It appears to furnish a wealth of 
secondary functions to its host, which may be useful in a wide 
range of ecological conditions. The clear mobility and diver- 
gent accessory nature of RGP53 might suggest that it is an 
ecological flexibility island. 

Primary Metabolites and Metabolic Pathways 

We used the "MicroCyc" metabolic database (Caspi et al. 
2010) to compare the metabolic potential in the genome of 
S. terrae BS001 with that of other Burl<holderia genomes (sup- 
plementary table S7, Supplementary Material online). 

Based on the values computed for pathway completion, 
hierarchical cluster analysis was performed for the 6. terrae 
BS001 genome next to 23 other genomes including next- 
of-kin Burkholderia species (supplementary fig. S3, 
Supplementary Material online). The analysis revealed that 
the Burl<holderia strains tended to group together, B. terrae 
BS001 being close to B. kururiensis Ml 30 and 6. pliymatum 
STM815, whereas B. phytofirmans PsJN grouped with B. viet- 
namiensis G4 and 6. ambifaria AMMD. Interestingly, B. rhizox- 
inica HKI454 grouped outside of the main Burkholderia 
cluster; however, this might have been compromised by its 
smaller genome size. 

Burkholderia terrae BS001 clearly invested in diverse pri- 
mary metabolisms, as evidenced from the presence of more 
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than 1,200 putative CDSs for carbohydrate metabolism 
(Nazir, Hansen, et al. 2012). Specifically, we found a puta- 
tive glycerol l<inase, gIpK (AKAUv1_1 300029), which was 
preceded by a glycerol-3-phosphate dehydrogenase iglpD, 
AKAUv1_1 300028). 

Remarl<ably, in the B. terrae BS001 genome, we found a 
gene for a specific glycerol uptake family transporter, denoted 
GUP. This system was absent from all other Burkholderia ge- 
nomes and it is actually typical for the yeast Saccharomyces 
cerevisiae. In this organism, active uptake of glycerol occurs 
mainly by GUP 1 and its close homolog GUP 2, which both 
encode multimembrane-spanning proteins (Hoist et al. 2000). 
Interestingly, the gene for GUP (AKAUv1_1930108) was 
present on RGP47, showing highest AAI (49%) with 
Magnetospirillum magneticum AMB-1 (fig. 7). The upstream 
ORF AKAUvl _1 9301 07 encodes protein of unknown function 
that also shares 25% AAI with a hypothetical protein from 
M. magneticum AMB-1. The unique presence of this GUP 
gene in B. terrae BS001 is consistent with the hypothesis 
that fungal-exuded glycerol (Nazir et al. 2013), which feeds 
the bacterium, may be taken up through it. The presence of 
this system on RGP47 may indicate that the bacterium ac- 
quired and fixed it in its quest to thrive in the mycosphere at 
the expense of exuded glycerol. 

Genes for glycerol degradation pathways I and V were pre- 
dicted to be present in the genome of B. terrae BS001. In 
comparison with the comparator strains, glycerol degradation 
pathway V was only complete in B. terrae BS001, whereas it 
was incomplete in strains PsJN and STM815 while strain 
HKI454 completely lacked it. Pathway V encompasses two 
reactions, and only the B. terrae BS001 genome carried 
ORFs (AKAUvl _7603 19, AKAUv1_244001 1 , and AKAUv1_ 
1890037) encoding enzymes catalyzing these. The first two 
ORFs are duplicates of each other, having 42.9% sequence 
identity. 

Strikingly, the BS001 genome further carried ORF 
AKAUvl _1 970009, which is predicted to encode a tyrosinase 
(involved in L-dopachrome biosynthesis), present on RGP49. 
We analyzed the potential phylogenetic history of this gene 
using a maximum-likelihood tree (supplementary fig. S4, 
Supplementary Material online) with closest hits in the data- 
base. Blasting the whole NCBI database only yielded four close 
hits, with highest AAI (73%) with Anaeromyxobacter dehalo- 
genans 2CP-1. The ORF directly upstream in BS001, 
AKAUvl _1 970008, was predicted to encode a twin-arginine 
signal translocation protein (Tat). This gene also occurred 
(55% AAI) directly upstream in A. dehalogenans 2CP-1. 
However, the region downstream of ORF AKAUv1_1 970010 
was different. The presence of the Tat gene upstream of the 
tyrosinase gene is consistent with the fact that the tyrosinase 
may require a signal peptide containing transactivator for its 
export and translocation (Berks et al. 2000; Schaerlaekens 
et al. 2001). This commonality between the rather unique B. 
terrae system and one in A. dehalogenans is striking. The latter 



organism belongs to a very different bacterial clade (delta- 
Proteobacteria) than B. terrae BS001; however, it occurs in 
the same habitat (soil) and is a very motile (gliding) bacterium 
(Thomas et al. 2008). Moreover, its mosaic genome has been 
postulated to be composed of parts coming from diverse pro- 
teobacteria. Based on this analysis and the uniqueness of the 
commonality, we postulate that the gene for tyrosinase may 
have been acquired from an organism like A. dehalogenans 
2CP-1 through horizontal gene transfer. i-Dopachrome is the 
building block of melanin, which can protect against ultravio- 
let radiation (Liu et al. 2004). Moreover, melanin in the cell 
wall of Aspergillus nidulans has been correlated to resistance 
against digestion by chitinases, glucanases, and proteases 
(Kuo and Alexander 1967). Bacterial melanin has been 
shown to interact with double-stranded DNA and its cellular 
localization may inhibit cell metabolism (Geng et al. 2010). 
Bacterial tyrosinases may also act on particular phenolic com- 
pounds, such as chlorophenols (Marino et al. 201 1) and fluor- 
ophenols (Battaini et al. 2002). The role of the unique 
tyrosinase found in the B. terrae BS001 genome is not well 
understood, but it may serve the organism to cope with stress- 
ful environments and detoxify fungal/plant phenolic 
compounds. 

Furthermore, the B. terrae BS001 genome revealed an 
enormous biosynthetic potential for a wide range of amino 
acids, see supplementary table S6, Supplementary Material 
online, indicating its flexibility in synthesizing the essential cel- 
lular building blocks of different kinds. 

Plant-Interactive Traits 

In the B. terrae BS001 genome, we further found biosynthetic 
potential for plant hormones, that is, genes for ethylene bio- 
synthesis pathway III (two out of three enzymes are present) 
and indole acetic acid (lAA) biosynthesis IV and V, whereas 
lAA biosynthesis pathway VI is partially present. This may sug- 
gest that B. terrae BS001 spends (part of) its life occupying a 
niche close to plants like the rhizosphere. The presence of 
three copies of a 1-aminocyclopropane-1-carboxylate deami- 
nase (ACC deaminase— acdS) gene (AKAUvl _1 50009, 
AKAUv1_510080, and AKAUvl _780047), denoted acdS-l, 
acdS-ll and acdS-lll, gives support to this hypothesis. 
Maximum-likelihood phylogenetic analyses showed that 
acdS-lll clustered separately from the other two genes and 
might have been acquired through horizontal gene transfer 
(HGT). In contrast, acdS-l and acdS-W may be the result of a 
duplication very early in evolution (supplementary fig. S5, 
Supplementary Material online). 

Similarly, RGP53 was found to carry two CDSs, here 
denoted as pnodW (putative nodW genes) (AKAUvl _ 
2130064 and AKAUvl _2 130080) that may act as two-com- 
ponent response regulators, with 81% AAI to a putative 
two-component response regulator from P. resinovorans 
NBRC 106553. Further analyses revealed that up to 18 
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paralogs of pnodW genes were present across the genome, 
with similarities of 30-59%. Downstream one ORF 
(AKAUv1_2 130067), we also found a PAS sensor protein 
(sensor kinase; nodV). This sensor kinase had seven paralogs 
in the genome with AAI of 35^6%. Both nodV and nodW 
belong to the two-component regulator family that activates 
the expression of other nodulation genes at the plant sur- 
face and in response to isoflavonoides (Sanjuan et al. 1992; 
Loh et al. 1997, 2002). We could not find other nodulation 
genes (such as nodABCD); however, elsewhere in the 
genome we found CDSs similar to nodN, nodi, nodJ, and 
nodT. The exact role of the two-component regulators 
{nodV and nodW) in strain BSOOl is not known yet, but 
the presence of such an array of nod genes allows the hy- 
pothesis that B. terrae BSOOl might also display a plant-in- 
teractive lifestyle. On the other hand nitrogen fixation genes 
were not present in the genome, excluding a classical nitro- 
gen fixation interaction. 

Detoxification Potential 

The B. terrae BSOOl genome revealed great potential to 
detoxify arsenate, phenylmercuric acetate, methylglyoxal. 



cyanate, fluoroacetate, and superoxide radicals. The gene 
encoding arsenate reductase {arsQ (AKAUv1_1 500002) is 
present on RGP38. Interestingly, two genes (AKAUv1_ 
1500022 and AKAUv1_1 500025) involved in cyanate degra- 
dation are located downstream on RGP38. Moreover, ORF 
AKAUv1_3230012, encoding aliphatic nitrilase that catalyzes 
a reaction in acrylonitrile degradation, was found to be har- 
bored by RGP78. Another putative ORF, AKAUv1_29201 14, 
was harbored by RGP73. It encodes a putative 2-nitropropane 
dioxygenase that is involved in alkylnitronate degradation. We 
also found that strain BSOOl is capable of degrading salicylate, 
anthranilate, benzoate, and catechol. Similarly, proteins for 
degradation of compounds that are released during the deg- 
radation of plant lignin in soil, such as protocatechuate, fer- 
ulate and vanillate, as described for Ralstonia solanacearum 
(Genin and Boucher 2004), are represented by ORFs in the 
genome of B. terrae BSOOl . 

The afore analyses indicate that B. terrae BSOOl has in- 
vested considerable genetic luggage in detoxification systems, 
which may relate to its persistence in ecological niches that are 
affluent in such compounds and may explain its versatile eco- 
logical behavior. 
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Membrane Transporter Landscape 

As gatekeepers to the outside, membrane transporters are 
important for metabolic activities. Lifestyle is thought to be a 
key driver of the evolution in numbers and types of transpor- 
ters (Gelfand and Rodionov 2007). Transporters can be clas- 
sified into three different classes based on their structure and 
mechanism of action, that is, energy-dependent (ATP depen- 
dent) membrane pumps, ion channels, and secondary transpor- 
ters (Nagata et al. 2008). We used TransAPP (Transporter 
Automatic Annotation Pipeline) in the search for membrane 
transporters. A total of 1,338 transporter-like ORFs were 
found (supplementary table S8, Supplementary Material 
online), and so the membrane transporter landscape of 
B. terrae BS001 was different from that of the other 
Burkholderia strains. Burkholderia phymatum STM815 had 
935 transporters, B. rhizoxinica HKI454 267 while according 
to Mitter et al. (2013), B. phytofirmans PsJN has 1,196 mem- 
brane transporters. As the transporter database lists 997 trans- 
porters for B. phytofirmans PsJN, we used this figure for our 
analysis. Across these Burktiolderia speaes, a relatively constant 
percentage of the total genomic space is thus dedicated to 
transporters, and so the larger genomes such as BS001 
(11.1 %), STM81 5(10.8%), and PsJN (1 2. 1 %) have more trans- 
porters compared with smaller ones such as HKI454 (7.1 %). 
Overall, the ATP-dependent systems represented the bulk of 
transporters (648), followed by major facilitator family (MPS) 
(237) membrane transporters. A detailed account of the trans- 
porters acrossallfourSur^to/der/a genomes analyzed isgivenin 
supplementary table S9, Supplementary Material online. Atotal 
of 81 ATP binding component CDSs of ABC transporters as well 
as 55 MPS CDSs (supplementary table S10, Supplementary 
Material online) were randomly picked and analyzed using 
SplitsTree4 (Huson and Bryant 2006). Thus, for both classes of 
transporters, a neighbor-net splits network was generated in 
order to depict different splits indicating potential evolutionary 
events. The ATP binding components of the ABC transporters 
revealed varying splits and radiations, indicating duplications to 
serve different functions in the genome (fig. 8A). Transporters 
of branched-chain amino acids clustered together, as opposed 
to those involved in glycerol-3-phosphate and sugars. Similarly, 
another split indicated that phosphate, Zn/Mn, Fe^"^, and nod 
factor transporters clustered together and are duplicates. Thus, 
a range of duplication events followed by mutational drift and 
selection is at the basis of the current diversity of membrane 
transporters in strain BS001 . Similarly, MPS transporters having 
similar roles clustered together, indicating another suite of du- 
plication events (fig. 86). Thus, expansion of the genome of B. 
terrae BS001 may have been driven mainly by nutritional needs 
in the light of locally fluctuating nutrient availability, for which a 
diversity of transporter systems is essential. Ren and Paulsen 
(2005) indicated that gene duplications might be at the basis 
of expansion, diversification, and selection of distinct transpor- 
ters families. 



Secondary Metabolite Biosynthetic Potential 

Many Burkholderia species exhibit a high potential for second- 
ary metabolite production (Kang et al. 1998; Partida-Martinez 
and Hertweck 2005; Partida-Martinez et al. 2007; Knappe 
et al. 2008; Schmidt et al. 2009; Rohm et al. 2010; 
Seyedsayamdost et al. 2010; Tawfik et al. 2010; 
Mahenthiralingam et al. 2011; Franke et al. 2012). As 
6. terrae BS001 possesses an exceptionally large genome, 
one might predict the ample presence of novel biosynthesis 
gene clusters. Analyses with antiSMASH (Medema et al. 201 1 ; 
Blin et al. 2013) revealed the existence of a locus (14,592 bp) 
that likely encodes a nonribosomal peptide synthetase (NRPS) 
(AKAUv1_1360004-AKAUv1_1 360005) and another one 
(12,975 bp) coding for a hybrid polyketide synthase (PKS)/ 
NRPS enzyme (AKAUv1_1710055, AKAUv1_1710056, 
and AKAUv1_1710057) (fig. 8A). The NRPS locus 
(AKAUv1_1360004-AKAUv1_1 360005) displays similarity 
(63% and 60%, respectively) to an NRPS locus of B. cenoce- 
pacia that is known to encode the siderophore ornibactin 
(Agnoli et al. 2006). As a mycosphere-inhabiting bacterium, 
6. ferrae BS001 is postulated to depend on iron uptake sys- 
tems under conditions of low iron. However, in the predicted 
ornibactin-like NRPS locus, four genes {pvdF [A/^-hydroxyor- 
nithine transformylase], orbS [sigma factor], orfaH [unknown 
function], orbG [a-ketoglutarate-dependent hydroxylases]) 
were missing. Moreover, several transporter genes {orbB, 
orbC, orbD, and orbF) involved in transport and utilization, 
differed from those of the ornibactin locus of B. cenocepacia 
(fig. 96). 

The modular organization of another cluster — a hybrid 
PKS/NRPS — is shown in figure 9A. Homologs of such hybrid 
PKS/NRPS gene clusters have been found in other bacterial 
genomes by BLASTP analyses (Altschul et al. 1990), notably 
in Burkholderia sp. BT03 and 6. phymatum STM815. 
However, no secondary metabolite has been described for 
such gene clusters yet. It would mean a great step fonA/ard 
if the products synthesized by these two clusters could be 
identified. 

The BLAST search for homologs of enzymes of the riboso- 
mally synthesized and posttranslationally modified peptides 
(RiPPs) pathway revealed no hits, suggesting that 6. terrae 
BS001 is not able to produce any such RiPPs. 

Conclusion 

We analyzed the genome of the fungal-interactive 6. terrae 
BS001, which was recently sequenced (Nazir, Hansen, et al. 
201 2). The genome is the largest among all hitherto available 
Burkholderia genomes, and it is equipped with a repertoire of 
genetic systems that encode predicted fungal- as well as plant- 
interactive traits. We thus identified systems potentially in- 
volved in physical interactions (biofilm formation and twitch- 
ing motility) with fungi, next to protein (as well as DNA) 
secretion systems (in particular T3SS) that may modulate 
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Fig. 8. — Neighbor-net splits graplis depicting tine membrane transporters landscape of 6. ferae BS001 . (A) NNet splits graph inferred from alignment of 
81 ATP binding component CDSs (amino acid sequences) of ABC (ATP binding cassette) transporters. (6) NNet splits graph derived from alignment of 55 MFS 
(major facilitator superfamily) CDSs (amino acid sequences). Alignments were manually curated and the resulting "nexus" files were analyzed using 
ProteinMLdist and WAG model (Gamma 4.0) within SplitsTree4 software. The splits indicate that membrane transporters (both ABC and MFS) split out 
and cluster by their respective functions and that gene duplications are at the basis of membrane transporters diversity. The list of CDS analyzed is given in the 
supplementary table S10, Supplementary Material online. 
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of Burkholderia terrae BSOOl . {A) Predicted substrates that get activated by the A-domains, are shown. Orn (Ornithine), Asp (Aspartic acid), Ser (Serine), Val 
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fungal host physiology and unique genes for glycerol uptake 
and metabolism. The genome further contains extensive RGPs 
(16.48% of the genome space), which carry functional genes 
as well as plasmid-related traits, indicating the presumed great 
role of horizontal gene transfer in shaping the genome. 
Moreover, 6. terrae BSOOl apparently invested considerable 
energy in primary (carbohydrate and amino acid) metabolism, 
membrane transporters, and even insect-interactive traits, in- 
dicating a versatile life style in soil. Such life style may have 
included (fungal/plant) host-interactive phases interspersed 
with free-living phases. Concerning the latter, the BSOOl 
genome also contained genes that potentially encode second- 
ary metabolite biosynthesis, which is useful for highly compet- 
itive situations that may occur in the soil upon exploration for 
colonization space. Overall, our analysis unveiled a plethora of 
ecological features in the genome that provide a firm basis for 
further studies based on transcriptomics of ecological situa- 
tions, including interactions with (fungal) hosts. 



Materials and Methods 

Genome Analysis and Annotation 

The genome of B. terrae BSOOl was submitted to MicroScope 
platform that is hosted at Genoscope for analysis. The 
genome data was integrated to PkGDB database of 
MicroScope and we have used the gene locus tags from the 
same database in this study. The genome sequencing data 
had been submitted to NCBI under accession number 
PRJNA1 57903/AKAUOOOOOOOO. 

Pan-Core Genome Computation 

The pan-core genome analysis was computed based on 
single-linkage clustering algorithm in the software package 
siLix (Single Linkage Clustering of Sequences), hosted by 
MicroScope. The algorithm implemented in the siLix is 
based on a principle where a sequence A if considered 



Genome Biol. Evol. 6(7): 1652-1 668. doi:10.1093/gbe/evu126 Advance Access publication June 12, 2014 



1665 



Haq et al. 



GBE 



homologous to sequence B and sequence B is homologous 
to C, then all the sequences (A, B, and C) are clustered in 
a same family, irrespective of the similarity between se- 
quences A and C (Miele et al. 2011). Therefore if a 
gene X is homologous to gene Y, they are clustered to- 
gether and if Y is a homolog of Z, then all three of them 
are grouped into the same MicroScope gene family 
(MICFAM). 

Synteny Calculations 

We employed MicroScope platform to compute synteny of B. 
terrae BS001 using data from PkGDB and NCBI RefSeq data- 
bases. Genes that satisfied BLASTP alignment threshold (35% 
sequence identity over 80% length of the smallest protein) or 
bidirectional best hit criteria, were defined as putative ortho- 
logs. Based on these relationships, synteny groups or syntons 
among other bacterial genomes were searched. The approach 
allows chromosomal rearrangements including insertions, de- 
letions, and inversions. Gap parameter was set to five genes, 
representing maximum number of consecutive genes not part 
of synteny group. 

Bioinformatics and Comparative Genome Analysis 

For comparative analysis of genomes, their metabolic profile 
prediction and pan-core genome analysis and genomic is- 
lands predictions, we used MicroScope platform of the 
Genoscope (http://www.genoscope.cns.fr/agc/microscope/ 
home/index.php, last accessed June 26, 2014). The RGP 
finder was used to detect more than 5-kb synteny breaks 
across a query genome and closely related (comparator) 
ones, followed by screening of the identified region for HGT 
features, such as the presence of mobility genes, tRNA hot 
spots and deviation of the G+C% from the main value. It then 
employed "SIGI-HMM" (based on the use of Hidden Markov 
Models, using codon usage to characterize the origin of 
genes) to measure codon usage aberrations (Waack et al. 
2006) and "AlienHunter," which is an "Interpolated 
Variable Order Motif" that uses variable-order motif distribu- 
tions (2-8) to exploit compositional biases (Vernikos and 
Parkhill 2006). 

Membrane transporters prediction and annotation were 
performed using TransAPP (Transporter Automatic 
Annotation Pipeline) of TransportDB (www.membranetran 
sport.org, last accessed June 26, 2014). Potential type III ef- 
fectors were predicted using www.effectors.org (last accessed 
June 26, 2014) with parameters (classification: [standard set] 
cutoff: [0.9999; selective]), whereas signal peptides were pre- 
dicted using online server SignalP 4.1 (Petersen et al. 2011). 
Antibiotics and secondary metabolites analysis shell 
(antiSMASH) (Blin et al. 2013) online server was used for pre- 
diction and analysis of secondary metabolites. 



Metabolic Pathways Comparison 

The metabolic profiles of B. terrae BS001 and other genomes 
in this study were analyzed using MicroCyc based on MetaCyc 
(Caspi et al. 201 0) of the MicroScope platform, as a reference 
pathway database. For each genome, comparative analysis of 
metabolic pathways was accomplished using MicroScope 
platform (https://vvww.genoscope.cns.fr/agc/microscope/me- 
tabolism, last accessed June 26, 2014). Each pathway has its 
own completion value (completion value = the number of en- 
zymatic reactions in a particular pathway in a given organism/ 
the total number of enzymatic reactions in the same pathway 
in MetaCyc). Hierarchical cluster analysis of the MicroCyc pre- 
dicted metabolic pathways for B. terrae BS001 was performed 
using MeV tool integrated in the MicroScope, taking result 
comparisons as input and using pathway completion values. 
The analysis was performed taking into consideration other 
Burkholderia and a number of additional bacterial genomes 
integrated in the PkGDB database. 

Phylogenetic Analysis 

Phylogenetic trees were made with MEGA 6 (Tamura et al. 
2013). Poorly aligned regions and gaps were manually elimi- 
nated. Maximum-likelihood method based on the JTT matrix- 
based model (Jones et al. 1992) was used to infer the 
evolutionary history for the T3SS, T4SS, ACC-deaminase, 
and tyrosinase genes. For modeling of evolutionary rate dif- 
ferences among sites discrete Gamma distribution was used (4 
categories [+G, parameter = 3.8674]). For network analysis of 
membrane transporters, amino acid sequences of random 
ORFs (ATP binding component of ABC and MFS transporters) 
were aligned using ClustalW in MEGA. The matrix was then 
analyzed in SplitsTree4 software using WAG model with 
Gamma 4.0 and ProteinMLdist (characters). 

Supplementary Material 

Supplementary figures S1-S5 and tables SI-SI 0 are available 
at Genome Biology and Evolution online (http://vvvvw.gbe. 
oxfordjournals.org/). 
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